Skip to content

Feat(ai-gemini): Gemini Realtime Adapter#405

Open
nikas-belogolov wants to merge 7 commits intoTanStack:mainfrom
nikas-belogolov:feat/gemini-realtime-chat
Open

Feat(ai-gemini): Gemini Realtime Adapter#405
nikas-belogolov wants to merge 7 commits intoTanStack:mainfrom
nikas-belogolov:feat/gemini-realtime-chat

Conversation

@nikas-belogolov
Copy link
Copy Markdown
Contributor

@nikas-belogolov nikas-belogolov commented Mar 29, 2026

🎯 Changes

  • Added realtime ephemeral token generation

✅ Checklist

  • I have followed the steps in the Contributing guide.
  • I have tested this code locally with pnpm run test:pr.

🚀 Release Impact

  • This change affects published code, and I have generated a changeset.
  • This change is docs/CI/dev-only (no release).

Summary by CodeRabbit

  • New Features
    • Gemini realtime provider added: selectable in the UI for realtime audio conversations with voice, audio capture/playback, and tool integration.
    • Tools UI now available for Gemini as well as OpenAI.
    • Provider-specific session options added to customize realtime behavior.
    • Tool configuration can include an output schema for richer tool responses.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 29, 2026

📝 Walkthrough
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 68.75% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Feat(ai-gemini): Gemini Realtime Adapter' accurately and clearly describes the main change—adding a Gemini Realtime Adapter to the ai-gemini package.
Description check ✅ Passed The description follows the required template with all sections present and checked boxes. However, it provides minimal detail ('Added realtime ephemeral token generation') without explaining the scope or full implementation of the Gemini Realtime Adapter.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@nikas-belogolov
Copy link
Copy Markdown
Contributor Author

nikas-belogolov commented Mar 29, 2026

There should be some discussion around how to implement the websocket connection, as there could be client-server (straight to the realtime API) connection, or client-server-server, with the web server (e.g. tanstack start) being a proxy

@nx-cloud
Copy link
Copy Markdown

nx-cloud bot commented Mar 29, 2026

🤖 Nx Cloud AI Fix Eligible

An automatically generated fix could have helped fix failing tasks for this run, but Self-healing CI is disabled for this workspace. Visit workspace settings to enable it and get automatic fixes in future runs.

To disable these notifications, a workspace admin can disable them in workspace settings.


View your CI Pipeline Execution ↗ for commit f40f0c8

Command Status Duration Result
nx affected --targets=test:sherif,test:knip,tes... ❌ Failed 2m 47s View ↗
nx run-many --targets=build --exclude=examples/** ❌ Failed 48s View ↗

☁️ Nx Cloud last updated this comment at 2026-03-29 10:28:30 UTC

@nikas-belogolov nikas-belogolov marked this pull request as ready for review April 13, 2026 19:46
@nikas-belogolov nikas-belogolov requested a review from a team April 13, 2026 19:46
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/typescript/ai-client/src/realtime-client.ts (1)

532-540: ⚠️ Potential issue | 🟡 Minor

Inconsistent tool mapping: applySessionConfig omits outputSchema.

The tool mapping in connect() (lines 104-114) includes both inputSchema and outputSchema, but applySessionConfig() only includes inputSchema. This inconsistency could lead to missing output schemas when the session is updated after initial connection.

🐛 Proposed fix for consistency
     const toolsConfig = tools
       ? Array.from(this.clientTools.values()).map((t) => ({
           name: t.name,
           description: t.description,
           inputSchema: t.inputSchema
             ? convertSchemaToJsonSchema(t.inputSchema)
             : undefined,
+          outputSchema: t.outputSchema
+            ? convertSchemaToJsonSchema(t.outputSchema)
+            : undefined,
         }))
       : undefined
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-client/src/realtime-client.ts` around lines 532 - 540,
The applySessionConfig mapping for tools omits outputSchema, causing
inconsistency with connect(); update applySessionConfig (the code building
toolsConfig from this.clientTools.values()) to include outputSchema the same way
inputSchema is handled by calling convertSchemaToJsonSchema on t.outputSchema
when present, so toolsConfig contains both inputSchema and outputSchema
(mirroring the mapping in connect()).
🧹 Nitpick comments (3)
packages/typescript/ai-client/src/realtime-client.ts (1)

99-102: Remove commented-out code.

This dead code should be removed to keep the codebase clean.

🧹 Proposed removal
-      // const toolsList =
-      //   this.clientTools.size > 0
-      //     ? Array.from(this.clientTools.values())
-      //     : undefined
-
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-client/src/realtime-client.ts` around lines 99 - 102,
Remove the dead/commented-out block that defines toolsList in realtime-client.ts
(the lines referencing toolsList and this.clientTools) — delete the commented
code entirely so no leftover commented declarations remain; ensure there are no
other references to the removed snippet (search for toolsList and usages of
this.clientTools in the surrounding code) and run lint/format to keep the file
clean.
packages/typescript/ai-gemini/src/index.ts (1)

86-90: Consider exporting Gemini realtime types from the main entry point.

The ./realtime/index module exports types (GeminiRealtimeModel, GeminiRealtimeTokenOptions, GeminiRealtimeOptions) that aren't re-exported here. Other adapters (text, summarize, image, tts) export their configuration types from the main entry point for consumer convenience.

♻️ Proposed addition for type exports
 // Realtime adapter
 export {
   geminiRealtime,
   geminiRealtimeToken,
 } from './realtime/index'
+export type {
+  GeminiRealtimeModel,
+  GeminiRealtimeTokenOptions,
+  GeminiRealtimeOptions,
+} from './realtime/index'
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-gemini/src/index.ts` around lines 86 - 90, Export the
realtime types from the main entry point so consumers can import them alongside
geminiRealtime and geminiRealtimeToken; add a type-only re-export like "export
type { GeminiRealtimeModel, GeminiRealtimeTokenOptions, GeminiRealtimeOptions }
from './realtime/index'" in the same file that currently exports geminiRealtime
and geminiRealtimeToken. Ensure you use a type-only export to avoid runtime
bundle changes and reference the exact type names GeminiRealtimeModel,
GeminiRealtimeTokenOptions, and GeminiRealtimeOptions so IDEs and consumers can
import them directly.
examples/ts-react-chat/src/routes/realtime.tsx (1)

279-293: Consider enabling additional configuration options for Gemini.

The tools indicator now correctly appears for Gemini. However, other session configuration options (output mode, temperature, semantic eagerness) remain OpenAI-only in the UI, but the underlying RealtimeSessionConfig supports these for all providers. Consider enabling some of these controls for Gemini if the Gemini Live API supports them.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/ts-react-chat/src/routes/realtime.tsx` around lines 279 - 293, The
UI currently gates several session configuration controls to provider ===
'openai'; update those conditionals to also include 'gemini' so that controls
for output mode, temperature, and semanticEagerness are shown when provider ===
'gemini' (same as how you added the tools indicator for Gemini). Locate the
conditional checks and UI blocks in the realtime component that reference
provider (and the controls bound to RealtimeSessionConfig such as outputMode,
temperature, semanticEagerness) and extend their logic to allow 'gemini'; ensure
the form bindings still map to RealtimeSessionConfig fields so changes propagate
to the session payload.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/ts-react-chat/src/lib/use-realtime.ts`:
- Around line 65-69: The Gemini branch is using shared OpenAI defaults (voice:
'alloy' and unconstrained responseModalities) which Gemini Live rejects; update
the geminiRealtime() path so it supplies Gemini-compatible defaults: set voice
to a valid Gemini prebuilt name (e.g., 'Puck' or another supported name) instead
of 'alloy', and set responseModalities to an explicit single-element array
(e.g., [Modality.AUDIO]) for the session; ensure geminiRealtime (or its caller
in useRealtime) overrides the shared defaults and enforces only one modality so
the Gemini connection will succeed.

In `@packages/typescript/ai-client/src/realtime-types.ts`:
- Around line 28-34: The ElevenLabs realtime adapter's connect function still
uses the old signature with clientToolDefs?: ReadonlyArray<AnyClientTool>, which
no longer matches the RealtimeAdapter.connect signature (token: RealtimeToken,
config: RealtimeSessionConfig) and causes type errors; update the connect method
in packages/typescript/ai-elevenlabs/src/realtime/adapter.ts (the function named
connect or the class implementing RealtimeAdapter) to accept (token:
RealtimeToken, config: RealtimeSessionConfig) and remove the legacy
clientToolDefs parameter, update any internal usages/calls within that adapter
to read configuration from the provided RealtimeSessionConfig, and ensure the
exported adapter type still satisfies RealtimeAdapter.

In `@packages/typescript/ai-gemini/src/realtime/adapter.ts`:
- Around line 96-112: liveConfig is missing transcription flags so Gemini won't
populate inputTranscription/outputTranscription; update the LiveConnectConfig
object built in the liveConfig variable to include inputAudioTranscription: {}
and outputAudioTranscription: {} (use the same config.providerOptions merge
pattern so you don't overwrite existing keys) so the API will emit transcription
data that the adapter reads when handling inputTranscription/outputTranscription
events.
- Around line 466-473: The sendImage function currently calls
session.sendRealtimeInput with a non-supported media property; update sendImage
to pass the image under the modality-specific video field instead of media or
deprecated mediaChunks (e.g., call session.sendRealtimeInput({ video: { /*
include the image bytes and mimeType as the frame payload */ } })). Locate
sendImage and replace the media object with a video object containing the image
data and mimeType in the API's expected frame/payload shape so the
session.sendRealtimeInput call uses video rather than media or mediaChunks.
- Around line 179-185: The convertFloat32ToInt16 function currently returns
buf.toString() (comma-separated integers); change it to produce a base64-encoded
string of the raw 16-bit PCM bytes suitable for the Gemini Live API audio.data
field (e.g., MIME audio/pcm;rate=16000). Convert the Float32Array to an
Int16Array (clamping and scaling as done now), then create a Uint8Array view
over the Int16Array's buffer (ensuring correct endianness), encode that byte
array to base64, and return the base64 string so audio.data contains raw PCM
bytes rather than CSV integers.
- Around line 188-190: The GoogleGenAI client is being constructed with an
ephemeral token (token.token) but missing the v1alpha API version; update the
initialization of GoogleGenAI (the ai instance) to include httpOptions: {
apiVersion: 'v1alpha' } when using ephemeral tokens so Live API requests use the
v1alpha endpoint, e.g., detect where new GoogleGenAI({ apiKey: token.token }) is
created and add the httpOptions.apiVersion flag.

In `@packages/typescript/ai-gemini/src/realtime/token.ts`:
- Around line 41-42: The default model string is invalid for the
GeminiRealtimeModel union: change the default assigned to the local variable
model (currently set via options.model ?? 'gemini-live-2.5-flash-native-audio')
to one of the allowed union values (e.g., 'gemini-3.1-flash-live-preview' or
'gemini-2.5-flash-native-audio-preview-12-2025') so the inferred type matches
GeminiRealtimeModel; update the expression using options.model to fall back to a
valid GeminiRealtimeModel literal.

---

Outside diff comments:
In `@packages/typescript/ai-client/src/realtime-client.ts`:
- Around line 532-540: The applySessionConfig mapping for tools omits
outputSchema, causing inconsistency with connect(); update applySessionConfig
(the code building toolsConfig from this.clientTools.values()) to include
outputSchema the same way inputSchema is handled by calling
convertSchemaToJsonSchema on t.outputSchema when present, so toolsConfig
contains both inputSchema and outputSchema (mirroring the mapping in connect()).

---

Nitpick comments:
In `@examples/ts-react-chat/src/routes/realtime.tsx`:
- Around line 279-293: The UI currently gates several session configuration
controls to provider === 'openai'; update those conditionals to also include
'gemini' so that controls for output mode, temperature, and semanticEagerness
are shown when provider === 'gemini' (same as how you added the tools indicator
for Gemini). Locate the conditional checks and UI blocks in the realtime
component that reference provider (and the controls bound to
RealtimeSessionConfig such as outputMode, temperature, semanticEagerness) and
extend their logic to allow 'gemini'; ensure the form bindings still map to
RealtimeSessionConfig fields so changes propagate to the session payload.

In `@packages/typescript/ai-client/src/realtime-client.ts`:
- Around line 99-102: Remove the dead/commented-out block that defines toolsList
in realtime-client.ts (the lines referencing toolsList and this.clientTools) —
delete the commented code entirely so no leftover commented declarations remain;
ensure there are no other references to the removed snippet (search for
toolsList and usages of this.clientTools in the surrounding code) and run
lint/format to keep the file clean.

In `@packages/typescript/ai-gemini/src/index.ts`:
- Around line 86-90: Export the realtime types from the main entry point so
consumers can import them alongside geminiRealtime and geminiRealtimeToken; add
a type-only re-export like "export type { GeminiRealtimeModel,
GeminiRealtimeTokenOptions, GeminiRealtimeOptions } from './realtime/index'" in
the same file that currently exports geminiRealtime and geminiRealtimeToken.
Ensure you use a type-only export to avoid runtime bundle changes and reference
the exact type names GeminiRealtimeModel, GeminiRealtimeTokenOptions, and
GeminiRealtimeOptions so IDEs and consumers can import them directly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 26ce7bc3-6e10-47ce-adc7-516c0d88c602

📥 Commits

Reviewing files that changed from the base of the PR and between a8a4465 and dadfae9.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (13)
  • .changeset/huge-lizards-admire.md
  • examples/ts-react-chat/src/lib/use-realtime.ts
  • examples/ts-react-chat/src/routes/realtime.tsx
  • packages/typescript/ai-client/src/realtime-client.ts
  • packages/typescript/ai-client/src/realtime-types.ts
  • packages/typescript/ai-gemini/package.json
  • packages/typescript/ai-gemini/src/index.ts
  • packages/typescript/ai-gemini/src/realtime/adapter.ts
  • packages/typescript/ai-gemini/src/realtime/index.ts
  • packages/typescript/ai-gemini/src/realtime/token.ts
  • packages/typescript/ai-gemini/src/realtime/types.ts
  • packages/typescript/ai-openai/src/realtime/adapter.ts
  • packages/typescript/ai/src/realtime/types.ts

Comment on lines +65 to +69
provider === 'openai'
? openaiRealtime()
: provider === 'gemini'
? geminiRealtime()
: elevenlabsRealtime()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

Gemini Live API JavaScript supported responseModalities per session and supported prebuilt voice names

💡 Result:

In the Gemini Live API JavaScript SDK (@google/genai), supported responseModalities are specified as an array in the session config, e.g., responseModalities: [Modality.AUDIO] or [Modality.TEXT]. These are set per session via LiveConnectConfig and determine the output types (AUDIO and/or TEXT) for that session. Native audio models primarily support AUDIO, while others support combinations including TEXT. Supported prebuilt voice names (used in speechConfig.voiceConfig.prebuiltVoiceConfig.voiceName) include 30 HD voices: Zephyr (Bright), Puck (Upbeat, default), Charon (Informative), Kore (Firm), Fenrir (Excitable), Leda (Youthful), Orus (Firm), Aoede (Breezy), Callirrhoe (Easy-going), Autonoe (Bright), Enceladus (Breathy), Iapetus (Clear), Laomedeia (Upbeat), Achernar (Soft), Alnilam (Firm), Schedar (Even), Gacrux (Mature), Pulcherrima (Forward), Sadachbia (Lively), Sadaltager (Knowledgeable), Sulafat (Warm), Umbriel (Easy-going), Erinome (Clear), Achird (Friendly), Algieba (Smooth), Algenib (Gravelly), Zubenelgenubi (Casual), Despina (Smooth), Rasalgethi (Informative), Vindemiatrix (Gentle). These are shared with TTS and configurable per session for audio responses.

Citations:


🏁 Script executed:

cat -n examples/ts-react-chat/src/lib/use-realtime.ts

Repository: TanStack/ai

Length of output: 3793


Update Gemini adapter to use Gemini-compatible voice and responseModalities configuration.

The geminiRealtime() adapter selected at line 68 is routed through shared defaults below that are incompatible with Gemini Live. Line 92 specifies voice: 'alloy' (an OpenAI voice), but Gemini Live only supports 30 prebuilt voices with names like Puck (default), Kore, Aoede, etc. Additionally, Gemini Live requires responseModalities to be explicitly set per session as an array (e.g., [Modality.AUDIO]) and only supports a single response modality per session, whereas the current config allows unconstrained modalities. The Gemini branch will fail on connection with these defaults.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/ts-react-chat/src/lib/use-realtime.ts` around lines 65 - 69, The
Gemini branch is using shared OpenAI defaults (voice: 'alloy' and unconstrained
responseModalities) which Gemini Live rejects; update the geminiRealtime() path
so it supplies Gemini-compatible defaults: set voice to a valid Gemini prebuilt
name (e.g., 'Puck' or another supported name) instead of 'alloy', and set
responseModalities to an explicit single-element array (e.g., [Modality.AUDIO])
for the session; ensure geminiRealtime (or its caller in useRealtime) overrides
the shared defaults and enforces only one modality so the Gemini connection will
succeed.

Comment on lines +96 to +112
const liveConfig: LiveConnectConfig = {
responseModalities,
tools: [{
functionDeclarations: config.tools
}],
speechConfig: {
voiceConfig: {
prebuiltVoiceConfig: {
voiceName: config.voice
}
}
},
maxOutputTokens: config.maxOutputTokens !== 'inf' ? config.maxOutputTokens : undefined,
systemInstruction: config.instructions,
temperature: config.temperature,
...config.providerOptions
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

Gemini Live API JavaScript how to enable input and output audio transcription in LiveConnectConfig

💡 Result:

To enable input and output audio transcription in the Gemini Live API using JavaScript with the @google/genai SDK, include inputAudioTranscription: {} and outputAudioTranscription: {} in the config object passed to ai.live.connect. Example code: import { GoogleGenAI, Modality } from '@google/genai'; const ai = new GoogleGenAI({ apiKey: 'YOUR_API_KEY' }); const session = await ai.live.connect({ model: 'gemini-live-2.5-flash-preview', // or appropriate live model config: { responseModalities: [Modality.AUDIO], inputAudioTranscription: {}, // Enables transcription of user input audio outputAudioTranscription: {}, // Enables transcription of model output audio }, callbacks: { onmessage: (message) => { const sc = message.serverContent; if (sc?.inputTranscription) { console.log('Input transcription:', sc.inputTranscription.text); } if (sc?.outputTranscription) { console.log('Output transcription:', sc.outputTranscription.text); } }, // other callbacks... } }); The AudioTranscriptionConfig is an empty object/marker interface; simply setting {} enables the feature. Transcriptions appear in serverContent.inputTranscription and serverContent.outputTranscription of LiveServerMessage events in the onmessage callback. Note: Ensure you're using a recent version of @google/genai (v0.9.0+) where transcription support was added. For Vertex AI, the config is nested under config as well.

Citations:


🏁 Script executed:

wc -l packages/typescript/ai-gemini/src/realtime/adapter.ts

Repository: TanStack/ai

Length of output: 112


🏁 Script executed:

cat -n packages/typescript/ai-gemini/src/realtime/adapter.ts

Repository: TanStack/ai

Length of output: 21398


Add inputAudioTranscription and outputAudioTranscription to enable transcript events.

The adapter emits transcript events from inputTranscription and outputTranscription (lines 230, 242), but liveConfig never enables transcription in the Gemini Live API config. Without inputAudioTranscription: {} and outputAudioTranscription: {}, the Gemini API will not populate these fields in the response, causing the transcript events to never fire.

Proposed fix
  const liveConfig: LiveConnectConfig = {
    responseModalities,
+   inputAudioTranscription: {},
+   outputAudioTranscription: {},
    tools: [{
      functionDeclarations: config.tools
    }],
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const liveConfig: LiveConnectConfig = {
responseModalities,
tools: [{
functionDeclarations: config.tools
}],
speechConfig: {
voiceConfig: {
prebuiltVoiceConfig: {
voiceName: config.voice
}
}
},
maxOutputTokens: config.maxOutputTokens !== 'inf' ? config.maxOutputTokens : undefined,
systemInstruction: config.instructions,
temperature: config.temperature,
...config.providerOptions
};
const liveConfig: LiveConnectConfig = {
responseModalities,
inputAudioTranscription: {},
outputAudioTranscription: {},
tools: [{
functionDeclarations: config.tools
}],
speechConfig: {
voiceConfig: {
prebuiltVoiceConfig: {
voiceName: config.voice
}
}
},
maxOutputTokens: config.maxOutputTokens !== 'inf' ? config.maxOutputTokens : undefined,
systemInstruction: config.instructions,
temperature: config.temperature,
...config.providerOptions
};
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-gemini/src/realtime/adapter.ts` around lines 96 - 112,
liveConfig is missing transcription flags so Gemini won't populate
inputTranscription/outputTranscription; update the LiveConnectConfig object
built in the liveConfig variable to include inputAudioTranscription: {} and
outputAudioTranscription: {} (use the same config.providerOptions merge pattern
so you don't overwrite existing keys) so the API will emit transcription data
that the adapter reads when handling inputTranscription/outputTranscription
events.

Comment on lines +179 to +185
function convertFloat32ToInt16(buffer: Float32Array) {
let l = buffer.length;
const buf = new Int16Array(l);
while (l--) {
buf[l] = Math.min(1, Math.max(-1, buffer[l]!)) * 0x7fff;
}
return buf.toString();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

@google/genai session.sendRealtimeInput audio.data expected format for Gemini Live JavaScript

💡 Result:

The expected format for audio.data in session.sendRealtimeInput({ audio: { data: ..., mimeType: ... } }) for the @google/genai JavaScript SDK (Gemini Live) is a base64-encoded string of raw 16-bit PCM audio data (little-endian, typically 16kHz sample rate). Usage example: session.sendRealtimeInput({ audio: { data: audioChunk.toString('base64'), // audioChunk is a Buffer/ArrayBuffer/Uint8Array of raw PCM bytes mimeType: 'audio/pcm;rate=16000' } }); Key details: - Audio format: Raw PCM, 16-bit signed integers, little-endian, mono. - Sample rate: 16kHz native (API resamples if needed). - mimeType: Must include rate, e.g., 'audio/pcm;rate=16000'. - Send in small chunks for real-time streaming. This matches official Google AI for Developers documentation and SDK source patterns.

Citations:


🏁 Script executed:

# First, let's check if the file exists and read the relevant section
cat -n packages/typescript/ai-gemini/src/realtime/adapter.ts | sed -n '170,200p'

Repository: TanStack/ai

Length of output: 981


🏁 Script executed:

# Search for usages of convertFloat32ToInt16 in the codebase
rg "convertFloat32ToInt16" --type ts --type tsx -B 2 -A 5

Repository: TanStack/ai

Length of output: 82


🏁 Script executed:

# Check for any other audio-related functions in the adapter file
rg "audio|pcm|Audio|PCM" packages/typescript/ai-gemini/src/realtime/adapter.ts -n

Repository: TanStack/ai

Length of output: 3431


Encode microphone PCM as base64-encoded raw bytes, not comma-separated integers.

Int16Array.prototype.toString() returns a comma-separated decimal string like "12,-44,9", but the Gemini Live API expects the audio.data field to contain a base64-encoded string of raw 16-bit PCM bytes with a MIME type like audio/pcm;rate=16000. Passing the current comma-separated format will cause the audio payload to be rejected or misinterpreted server-side, preventing microphone input from decoding correctly.

🐛 Proposed fix
 function convertFloat32ToInt16(buffer: Float32Array) {
-  let l = buffer.length;
-  const buf = new Int16Array(l);
-  while (l--) {
-    buf[l] = Math.min(1, Math.max(-1, buffer[l]!)) * 0x7fff;
-  }
-  return buf.toString();
+  const pcm = new Int16Array(buffer.length)
+  for (let i = 0; i < buffer.length; i++) {
+    pcm[i] = Math.min(1, Math.max(-1, buffer[i]!)) * 0x7fff
+  }
+
+  const bytes = new Uint8Array(pcm.buffer)
+  let binary = ''
+  for (const byte of bytes) {
+    binary += String.fromCharCode(byte)
+  }
+  return btoa(binary)
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-gemini/src/realtime/adapter.ts` around lines 179 -
185, The convertFloat32ToInt16 function currently returns buf.toString()
(comma-separated integers); change it to produce a base64-encoded string of the
raw 16-bit PCM bytes suitable for the Gemini Live API audio.data field (e.g.,
MIME audio/pcm;rate=16000). Convert the Float32Array to an Int16Array (clamping
and scaling as done now), then create a Uint8Array view over the Int16Array's
buffer (ensuring correct endianness), encode that byte array to base64, and
return the base64 string so audio.data contains raw PCM bytes rather than CSV
integers.

Comment on lines +188 to +190
const ai = new GoogleGenAI({
apiKey: token.token
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

Does @google/genai require httpOptions.apiVersion = 'v1alpha' when connecting to Gemini Live with an ephemeral token?

💡 Result:

Yes, @google/genai requires httpOptions.apiVersion = 'v1alpha' when connecting to Gemini Live with an ephemeral token. Ephemeral tokens are exclusively compatible with the Live API and mandate the v1alpha API version. Official documentation explicitly states that when using an ephemeral token as the apiKey for client initialization, the http_options must be set to {'api_version': 'v1alpha'}. This is also required for Live API preview features like affective dialog and proactive audio, and for creating the ephemeral tokens themselves.

Citations:


🏁 Script executed:

# First, let's check the file structure and the specific lines mentioned
head -n 200 packages/typescript/ai-gemini/src/realtime/adapter.ts | tail -n 20

Repository: TanStack/ai

Length of output: 520


🏁 Script executed:

# Let's also search for other GoogleGenAI instantiations in the codebase
rg "new GoogleGenAI" -A 3 packages/typescript/ai-gemini/

Repository: TanStack/ai

Length of output: 873


🏁 Script executed:

# Check the `@google/genai` version in the project to verify SDK compatibility
cat packages/typescript/ai-gemini/package.json | grep -A 5 -B 5 "@google/genai"

Repository: TanStack/ai

Length of output: 264


🏁 Script executed:

# Check more context around the GoogleGenAI initialization to verify it's for Live API
sed -n '180,210p' packages/typescript/ai-gemini/src/realtime/adapter.ts

Repository: TanStack/ai

Length of output: 907


🏁 Script executed:

# Also check the token.ts file to understand the different use case
cat -n packages/typescript/ai-gemini/src/realtime/token.ts | head -n 50

Repository: TanStack/ai

Length of output: 1966


Add v1alpha API version when initializing GoogleGenAI with ephemeral token.

Ephemeral tokens for Gemini Live API are v1alpha-only. When using an ephemeral token as the API key, the client must be initialized with httpOptions: { apiVersion: 'v1alpha' }. Without this configuration, Live API requests will route to the wrong API version and fail.

Proposed fix
  const ai = new GoogleGenAI({
-   apiKey: token.token
+   apiKey: token.token,
+   httpOptions: {
+     apiVersion: 'v1alpha',
+   },
  });
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const ai = new GoogleGenAI({
apiKey: token.token
});
const ai = new GoogleGenAI({
apiKey: token.token,
httpOptions: {
apiVersion: 'v1alpha',
},
});
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-gemini/src/realtime/adapter.ts` around lines 188 -
190, The GoogleGenAI client is being constructed with an ephemeral token
(token.token) but missing the v1alpha API version; update the initialization of
GoogleGenAI (the ai instance) to include httpOptions: { apiVersion: 'v1alpha' }
when using ephemeral tokens so Live API requests use the v1alpha endpoint, e.g.,
detect where new GoogleGenAI({ apiKey: token.token }) is created and add the
httpOptions.apiVersion flag.

Comment on lines +466 to +473
sendImage(imageData: string, mimeType: string) {
// Only accepts raw image data, not URLs
session.sendRealtimeInput({
media: {
data: imageData,
mimeType: mimeType,
}
})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

Gemini Live API JavaScript realtime image input mediaChunks vs media field

💡 Result:

In the Gemini Live API for JavaScript (using @google/genai SDK), realtime image input is sent via the session.sendRealtimeInput method. Images (for vision/video at <=1 FPS, JPEG recommended) are provided as a Blob object with base64-encoded data and mimeType like 'image/jpeg'. Example from official docs: session.sendRealtimeInput({ video: { // Use 'video' field for image frames data: frame.toString('base64'), mimeType: 'image/jpeg' } }); The underlying protocol (BidiGenerateContentRealtimeInput) previously used a 'mediaChunks[]' array of Blobs for generic media input, but this is DEPRECATED. Now use modality-specific fields: 'audio' for PCM audio chunks, 'video' for image frames, or 'text' for text. Multiple mediaChunks are ignored beyond the first, and the new fields provide clearer typing and streaming support. For non-realtime (context-building), use sendClientContent with inlineData parts, e.g.: session.sendClientContent({ turns: [{ parts: [{ inlineData: { mimeType: 'image/jpeg', data: base64Image } }] }] }); No distinct 'media' field exists in current realtime input; 'video' handles images. Use sendRealtimeInput for continuous streaming (optimized for low latency, non-deterministic order), sendClientContent for ordered context addition.

Citations:


sendImage must use the video field, not media or deprecated mediaChunks.

The sendRealtimeInput() method does not accept a media property. For image frames, use the video field instead. The mediaChunks field (suggested in the diff below) is deprecated and should not be used; the current API uses modality-specific fields: video for image frames, audio for PCM audio, and text for text.

🐛 Proposed fix
     sendImage(imageData: string, mimeType: string) {
-      // Only accepts raw image data, not URLs
       session.sendRealtimeInput({
-        media: {
-          data: imageData,
-          mimeType: mimeType,
-        }
+        video: {
+          data: imageData,
+          mimeType,
+        },
       })
       currentMode = 'thinking'
       emit('mode_change', { mode: 'thinking' })
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
sendImage(imageData: string, mimeType: string) {
// Only accepts raw image data, not URLs
session.sendRealtimeInput({
media: {
data: imageData,
mimeType: mimeType,
}
})
sendImage(imageData: string, mimeType: string) {
session.sendRealtimeInput({
video: {
data: imageData,
mimeType,
},
})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-gemini/src/realtime/adapter.ts` around lines 466 -
473, The sendImage function currently calls session.sendRealtimeInput with a
non-supported media property; update sendImage to pass the image under the
modality-specific video field instead of media or deprecated mediaChunks (e.g.,
call session.sendRealtimeInput({ video: { /* include the image bytes and
mimeType as the frame payload */ } })). Locate sendImage and replace the media
object with a video object containing the image data and mimeType in the API's
expected frame/payload shape so the session.sendRealtimeInput call uses video
rather than media or mediaChunks.

Comment on lines +41 to +42
const model: GeminiRealtimeModel =
options.model ?? 'gemini-live-2.5-flash-native-audio'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "Default model literal used by token adapter:"
rg -n "gemini-live-2\.5-flash-native-audio|gemini-2\.5-flash-native-audio-preview-12-2025" -- packages/typescript/ai-gemini/src/realtime

echo
echo "Declared GeminiRealtimeModel union:"
sed -n '40,60p' packages/typescript/ai-gemini/src/realtime/types.ts

Repository: TanStack/ai

Length of output: 1293


Fix the default Gemini model ID to match the declared type.

options.model ?? 'gemini-live-2.5-flash-native-audio' assigns a string literal that is not in the GeminiRealtimeModel union type. The declared union only includes 'gemini-3.1-flash-live-preview' and 'gemini-2.5-flash-native-audio-preview-12-2025'. Replace with the valid model ID:

Fix
       const model: GeminiRealtimeModel =
-        options.model ?? 'gemini-live-2.5-flash-native-audio'
+        options.model ?? 'gemini-2.5-flash-native-audio-preview-12-2025'
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const model: GeminiRealtimeModel =
options.model ?? 'gemini-live-2.5-flash-native-audio'
const model: GeminiRealtimeModel =
options.model ?? 'gemini-2.5-flash-native-audio-preview-12-2025'
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-gemini/src/realtime/token.ts` around lines 41 - 42,
The default model string is invalid for the GeminiRealtimeModel union: change
the default assigned to the local variable model (currently set via
options.model ?? 'gemini-live-2.5-flash-native-audio') to one of the allowed
union values (e.g., 'gemini-3.1-flash-live-preview' or
'gemini-2.5-flash-native-audio-preview-12-2025') so the inferred type matches
GeminiRealtimeModel; update the expression using options.model to fall back to a
valid GeminiRealtimeModel literal.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/typescript/ai-elevenlabs/src/realtime/adapter.ts (1)

35-47: ⚠️ Potential issue | 🟠 Major

Wire options and config parameters to session initialization.

The connect() method accepts both options (from elevenlabsRealtime()) and _config (RealtimeSessionConfig) but passes neither to createElevenLabsConnection(), causing caller-provided configuration to be silently dropped. While ElevenLabsRealtimeOptions includes meaningful fields (connectionMode, debug), and the Gemini adapter demonstrates proper config wiring, the ElevenLabs implementation ignores both parameters. At minimum, rename _config to config to reflect its intended use and thread it through to session creation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-elevenlabs/src/realtime/adapter.ts` around lines 35 -
47, The connect implementation of elevenlabsRealtime is dropping the
caller-provided ElevenLabsRealtimeOptions and RealtimeSessionConfig; rename the
unused parameter _config to config in the connect signature and thread both
options (from elevenlabsRealtime) and config (RealtimeSessionConfig) into
createElevenLabsConnection so session initialization receives
connectionMode/debug and session config; update the call sites inside
elevenlabsRealtime.connect to pass (token, config, options, clientToolDefs) or
the equivalent parameter order expected by createElevenLabsConnection and adjust
createElevenLabsConnection invocation accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@packages/typescript/ai-elevenlabs/src/realtime/adapter.ts`:
- Around line 35-47: The connect implementation of elevenlabsRealtime is
dropping the caller-provided ElevenLabsRealtimeOptions and
RealtimeSessionConfig; rename the unused parameter _config to config in the
connect signature and thread both options (from elevenlabsRealtime) and config
(RealtimeSessionConfig) into createElevenLabsConnection so session
initialization receives connectionMode/debug and session config; update the call
sites inside elevenlabsRealtime.connect to pass (token, config, options,
clientToolDefs) or the equivalent parameter order expected by
createElevenLabsConnection and adjust createElevenLabsConnection invocation
accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e9ecd1b6-aed5-4bb5-81bc-c287adfd2e67

📥 Commits

Reviewing files that changed from the base of the PR and between dadfae9 and 0711441.

📒 Files selected for processing (5)
  • packages/typescript/ai-client/src/realtime-client.ts
  • packages/typescript/ai-client/src/realtime-types.ts
  • packages/typescript/ai-elevenlabs/src/realtime/adapter.ts
  • packages/typescript/ai-gemini/src/realtime/adapter.ts
  • packages/typescript/ai-openai/src/realtime/adapter.ts
🚧 Files skipped from review as they are similar to previous changes (3)
  • packages/typescript/ai-client/src/realtime-client.ts
  • packages/typescript/ai-client/src/realtime-types.ts
  • packages/typescript/ai-openai/src/realtime/adapter.ts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant